A Bootstrapping Approach for Geographic Named Entity Annotation
نویسندگان
چکیده
Geographic named entities can be classified into many subtypes that are useful for applications such as information extraction and question answering. In this paper, we present a bootstrapping algorithm for the task of geographic named entity annotation. In the initial stage, we annotate a raw corpus using seeds. From the initial annotation, boundary patterns are learned and applied to the corpus again to annotate new candidates. Type verification is adopted to reduce over-generation. One sense per discourse principle increases positive instances and also corrects mistaken annotations. As the bootstrapping loop proceeds, the annotated instances are increased gradually and the learned boundary patterns become gradually richer.
منابع مشابه
Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping
One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuristics for reducing such errors using external resources such as WordNet, encyclopedia and Web documents. The bootstrapping is applied for identifying and classifying fine-grained geographic named entities, which are useful for ...
متن کاملA Bootstrapping Approach for Training a NER with Conditional Random Fields
In this paper we present a bootstrapping approach for training a Named Entity Recognition (NER) system. Our method starts by annotating person names on a dataset of 50,000 news items. This is performed using a simple dictionary-based approach. Using such training set we build a classification model based on Conditional Random Fields (CRF). We then use the inferred classification model to perfor...
متن کاملBootstraping Information Extraction Using Regularity of Web Pages
To annotate web documents with metadata automatically, we must prepare a database that stores annotation targets and these metadata. In the case of location information, we need a database that stores many named entities (NEs) and their location information (i.e., telephone number and address). In this paper, we present a bootstrapping approach to extract triples. We describe our extraction met...
متن کاملOptimising Selective Sampling for Bootstrapping Named Entity Recognition
Training a statistical named entity recognition system in a new domain requires costly manual annotation of large quantities of in-domain data. Active learning promises to reduce the annotation cost by selecting only highly informative data points. This paper is concerned with a real active learning experiment to bootstrap a named entity recognition system for a new domain of radio astronomical...
متن کاملGrounding Spatial Named Entities For Information Extraction And Question Answering
The task of named entity annotation of unseen text has recently been successfully automated with near-human performance. But the full task involves more than annotation, i.e. identifying the scope of each (continuous) text span and its class (such as place name). It also involves grounding the named entity (i.e. establishing its denotation with respect to the world or a model). The latter aspec...
متن کامل